true

A Trilogy of Posts on Code Optimization

This is the first post of a trilogy focused on code optimization.

I’ll draw from my experience developing scientific software in both academia and industry to share practical techniques and tools that can help you streamline your R workflows without sacrificing clarity. By the end of this trilogy, you’ll have a solid understanding of the foundational principles of code optimization and when to approach it—without overcomplicating things.

I hope the principles presented here will help you write code that’s not just faster but also sustainable and practical to maintain.

Let’s dig in!

A Fine Balance

Whether you’re playing with large data, designing spatial pipelines, or developing scientific packages, at some point everyone writes a regrettably sluggish piece of junk code.

The simplest way to make it run is simple enough: throw more money at your cloud provider, or upgrade your rig, and get MORE!

Kylo Ren says MORE! More cores, more RAM, more POWER! Because who doesn’t love bragging about that shit? I surely do!

Post by @blasbenito@fosstodon.org
View on Mastodon


On the other hand, money is expensive (duh!), and computing has a serious environmental footprint. With this in mind, it’s also good to remember that we went to the Moon and back on less than 4kb of RAM, so there must be a way to make our junk code run in a more sustainable manner.

This is where code optimization comes into play!

Optimizing code is not about making it faster, it is about making it efficient for developers, users, and machines alike. For us, pitiful carbon-based blobs, readable code is easier to wield, ergo efficient. For a machine, efficient code runs fast and has a small memory footprint. And there is an inherent tension there! Optimizing for computational performance alone often comes at the cost of readability, while clean, readable code can sometimes slow things down. That’s why optimization requires making some strategic choices. Before diving headfirst into code optimization, it’s crucial to understand the dimensions of code efficiency and when code optimization is actually worth it.

The Dimensions of Code Efficiency

Code efficiency is an abstract concept involving a complex web of causes and effects. Unveiling the whole thing bare here is beyond the scope of this post, but I believe that understanding some of the foundations may help articulate successful code optimization strategies.

Let’s take a look at the diagram below.

Pillars of Code Efficiency On the left, there are several major code features that we can tweak to improve (or worsen!) the efficiency of our code. Changes in these features are bound to shape the efficiency landscape of our code in often divergent ways.

Code Features

The following code features fundamentally shape how efficient our programs can be:

  • Programming Language: The choice between compiled and interpreted languages affects execution speed and memory usage.
  • Simplicity and Readability: Clean, maintainable code reduces cognitive load and developer time.
  • Algorithm Design and Data Structures: Well-designed algorithms and appropriate data structures determine how efficiently code scales with data size.
  • Hardware Utilization: Techniques like vectorization, parallelization, and memory management determine how well code leverages available computational resources.

Each of these features will be explored in detail in the next articles of this trilogy.

The Effects

These foundational choices impact three key performance dimensions:

  • Execution Speed (Time Complexity): The time required to run the code.
  • Memory Usage (Space Complexity): Peak memory usage during run time.
  • Input/Output Efficiency: How well the code handles file access, network usage, and database queries.

At a higher level, two emergent properties arise:

  • Scalability: How well the code adapts to increasing workloads and larger infrastructures.
  • Energy Efficiency: The trade-off between computational cost and energy consumption.

Code optimization is a multidimensional trade-off. Improving one aspect often affects others. For example, speeding up execution might increase memory usage, parallelization can create I/O bottlenecks. There’s rarely a single “best” solution, only trade-offs based on context and constraints.

To Optimize Or Not To Optimize

If for some reason you find yourself in the conundrum expressed in the title of this section, then you might find solace in the First Commandment of Code Optimization.

“Thou shall not optimize thy code.”

Also known in some circles as the YOLO Principle, this commandment reveals the righteous path! If your code is reasonably simple and works as expected, you can call it a day and move on, because there is no reason whatsoever to attempt any optimization. This idea aligns well with a principle enunciated long ago:

“Premature optimization is the root of all evil.” — Donald Knuth - The Art of Computer Programming

Premature optimization happens when we let performance considerations get in the way of our code design. Designing code is a taxing task already, and designing code while trying to make it efficient at once is even harder! Having a non-trivial fraction of our mental bandwidth focused on optimization results in code more complex than it should be, and increases the chance of introducing bugs.

That said, there are legitimate reasons to break the first commandment. Maybe you are bold enough to publish your code in a paper (Reviewer #2 says hi), releasing it as package for the community, or simply sharing it with your data team. In these cases, the Second Commandment comes into play.

“Thou shall make thy code simple.”

Optimizing code for simplicity isn’t just about aesthetics; it’s about making it readable, maintainable, and easy to use and debug. In essence, this commandment ensures that we optimize the time required to interact with the code. Any code that saves the time of users and maintainers is efficient enough already!

This post is not focused on code simplicity, but here are a few key principles that might be helpful:

Beyond these tips, I highly recommend the book A Philosophy of Software Design, by John Ousterhout. It helped me find new ways to write better code!

At this point we have a clean and elegant code that runs once and gets the job done, great! But what if it must run thousands of times in production? Or worse, what if a single execution takes hours or even days? In these cases, optimization shifts from a nice-to-have to a requirement. Yep, there’s a commandment for this too.

“Thou shall optimize wisely.”

At this point you might be at the ready, fingers on the keyboard, about to deface your pretty code for the sake of sheer performance. Just don’t. This is a great point to stop, go back to the whiteboard, and think carefully about what you want need to do. You gotta be smart about your next steps!

Here there are a couple of ideas that might help you get smart about optimization.

First, keep the Pareto Principle in mind! It says that, roughly, 80% of the consequences result from 20% of the causes. When applied to code optimization, this principle translates into a simple fact: most performance issues are produced by a small fraction of the code. From here, the best course of action requires identifying these critical code blocks (more about this later) and focusing our optimization efforts on them. Once you’ve identified the real bottlenecks, the next step is making sure your optimizations don’t introduce unnecessary complexity.

Second, beware of over-optimization. Taking code optimization too far can do more harm than good! Over-optimization happens when we keep pushing for marginal performance gains at the expense of clarity. It often results in convoluted one-liners and obscure tricks to save milliseconds that will confuse future you while making your code harder to maintain. Worse yet, excessive tweaking can introduce subtle bugs. In short, optimizing wisely means knowing when to stop. A clear, maintainable solution that runs fast enough is often better than a convoluted one that chases marginal gains.

Beyond these important points, there is no golden rule to follow here. Optimize when necessary, but never at the cost of clarity!

What’s Next

In the next article of this trilogy, we’ll dive deeper into the code features that shape efficiency: programming languages, simplicity and readability, and algorithm design. We’ll explore how high-level design choices impact the performance and maintainability of your code.

Stay tuned!